A large amount of recent research has the far-reaching goal of finding training methods for deep neural networks that can serve as alternatives to backpropagation (BP). A prominent example is predictive coding (PC), which is a neuroscience-inspired method that performs inference on hierarchical Gaussian generative models. These methods, however, fail to keep up with modern neural networks, as they are unable to replicate the dynamics of complex layers and activation functions. In this work, we solve this problem by generalizing PC to arbitrary probability distributions, enabling the training of architectures, such as transformers, that are hard to approximate with only Gaussian assumptions. We perform three experimental analyses. First, we study the gap between our method and the standard formulation of PC on multiple toy examples. Second, we test the reconstruction quality on variational autoencoders, where our method reaches the same reconstruction quality as BP. Third, we show that our method allows us to train transformer networks and achieve a performance comparable with BP on conditional language models. More broadly, this method allows neuroscience-inspired learning to be applied to multiple domains, since the internal distributions can be flexibly adapted to the data, tasks, and architectures used.
translated by 谷歌翻译
最近,对生成自然语言解释(NLE)的模型的兴趣日益增长。但是,培训提供NLES的模型需要获取特定于任务的NLE,这是时间和资源的。潜在的解决方案是从域的域的域外逆转,通过几次射门传输学习,具有大量NLE的域与具有稀缺的域,但潜在的标签。在这项工作中,我们为几个NLE的案例引入了几次射门转移学习的三种香草方法,但标签很少,以及适应现有的香草微调方法。我们从自然语言推理域中传输解释性,其中人写入的NLES的大型数据集(E-SNLI),到代词解析的域名(1)代词分辨率的域,在那里我们在顶部引入了一个小型数据集Winogrande DataSet(小型e-winogrande)和(2)致辞验证(Comve)。我们的结果表明,NLES的转移优于单项任务方法,并建立四个已确定的培训制度中的最佳策略。我们还在培训数据和模型大小方面调查最佳方法的可扩展性。
translated by 谷歌翻译
Due to its importance in facial behaviour analysis, facial action unit (AU) detection has attracted increasing attention from the research community. Leveraging the online knowledge distillation framework, we propose the ``FANTrans" method for AU detection. Our model consists of a hybrid network of convolution and transformer blocks to learn per-AU features and to model AU co-occurrences. The model uses a pre-trained face alignment network as the feature extractor. After further transformation by a small learnable add-on convolutional subnet, the per-AU features are fed into transformer blocks to enhance their representation. As multiple AUs often appear together, we propose a learnable attention drop mechanism in the transformer block to learn the correlation between the features for different AUs. We also design a classifier that predicts AU presence by considering all AUs' features, to explicitly capture label dependencies. Finally, we make the attempt of adapting online knowledge distillation in the training stage for this task, further improving the model's performance. Experiments on the BP4D and DISFA datasets demonstrating the effectiveness of proposed method.
translated by 谷歌翻译
四倍的机器人通常配备额外的手臂进行操作,对价格和重量产生负面影响。另一方面,腿部运动的要求意味着,这种机器人的腿通常具有执行操作所需的扭矩和精度。在本文中,我们介绍了一种新颖的设计,该设计针对一个小型四倍的机器人,配备了两个受甲壳类动物和指关节walker前的前肢启发的腿部安装机。通过使用腿部已经存在的执行器,我们只能使用每个肢体额外的3个电动机来实现操纵。该设计使相对于腿部电动机的小型且廉价的执行器的使用,从而进一步降低了成本和重量。由于集成的电缆/皮带轮系统,惯性的瞬间对腿的影响很小。正如我们在一套远程操作实验中所显示的那样,机器人能够执行单个和双LIMB操纵,并在操纵模式之间过渡。拟议的设计的性能与额外的手臂相似,同时称重和成本减少了每个操纵器的5倍,并可以完成需要2个操纵器的任务。
translated by 谷歌翻译
街道级别图像对原位数据收集进行扩大占据了重要潜力。通过组合使用便宜的高质量相机与最近的深度学习计算解决方案的进步来实现这一点,以推导出相关专题信息。我们介绍了一个框架,用于使用计算机视觉从街道层图像中收集和提取作物类型和候选信息。在2018年生长季节期间,高清图片被捕获在荷兰弗莱洛兰省的侧视动作相机。每个月从3月到10月,调查了一个固定的200公里路线,每秒收集一张照片,结果总计40万个地理标记的图片。在220个特定的包裹物位置,记录了现场作物的观察结果,以获得17种作物。此外,时间跨度包括特定的出苗前包裹阶段,例如用于春季和夏季作物的不同栽培的裸土,以及收获后栽培实践,例如,绿色皱眉和捕捉庄稼。基于与卷积神经网络(MobileNet)的转移学习,使用具有众所周知的图像识别模型的Tensorflow进行分类。开发了一种超核解方法,以获得160型号的表现最佳模型。这种最佳模型应用于独立推理的鉴别作物类型,宏观F1分数为88.1%的宏观效果,在包裹水平的86.9%。讨论了这种方法的潜力和警告以及实施和改进的实际考虑因素。所提出的框架速度升高了高质量的原位数据收集,并通过使用计算机视觉自动分类建议大规模数据收集的途径。
translated by 谷歌翻译